The xView2 competition and xBD dataset spurred significant advancements in overhead building damage detection, but the competition's pixel level scoring can lead to reduced solution performance in areas with tight clusters of buildings or uninformative context. We seek to advance automatic building damage assessment for disaster relief by proposing an auxiliary challenge to the original xView2 competition. This new challenge involves a new dataset and metrics indicating solution performance when damage is more local and limited than in xBD. Our challenge measures a network's ability to identify individual buildings and their damage level without excessive reliance on the buildings' surroundings. Methods that succeed on this challenge will provide more fine-grained, precise damage information than original xView2 solutions. The best-performing xView2 networks' performances dropped noticeably in our new limited/local damage detection task. The common causes of failure observed are that (1) building objects and their classifications are not separated well, and (2) when they are, the classification is strongly biased by surrounding buildings and other damage context. Thus, we release our augmented version of the dataset with additional object-level scoring metrics https://gitlab.kitware.com/dennis.melamed/xfbd to test independence and separability of building objects, alongside the pixel-level performance metrics of the original competition. We also experiment with new baseline models which improve independence and separability of building damage predictions. Our results indicate that building damage detection is not a fully-solved problem, and we invite others to use and build on our dataset augmentations and metrics.
translated by 谷歌翻译
Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1{\deg}. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base.
translated by 谷歌翻译
We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.
translated by 谷歌翻译
Knowledge of the symmetries of reinforcement learning (RL) systems can be used to create compressed and semantically meaningful representations of a low-level state space. We present a method of automatically detecting RL symmetries directly from raw trajectory data without requiring active control of the system. Our method generates candidate symmetries and trains a recurrent neural network (RNN) to discriminate between the original trajectories and the transformed trajectories for each candidate symmetry. The RNN discriminator's accuracy for each candidate reveals how symmetric the system is under that transformation. This information can be used to create high-level representations that are invariant to all symmetries on a dataset level and to communicate properties of the RL behavior to users. We show in experiments on two simulated RL use cases (a pusher robot and a UAV flying in wind) that our method can determine the symmetries underlying both the environment physics and the trained RL policy.
translated by 谷歌翻译
Hidden parameters are latent variables in reinforcement learning (RL) environments that are constant over the course of a trajectory. Understanding what, if any, hidden parameters affect a particular environment can aid both the development and appropriate usage of RL systems. We present an unsupervised method to map RL trajectories into a feature space where distance represents the relative difference in system behavior due to hidden parameters. Our approach disentangles the effects of hidden parameters by leveraging a recurrent neural network (RNN) world model as used in model-based RL. First, we alter the standard world model training algorithm to isolate the hidden parameter information in the world model memory. Then, we use a metric learning approach to map the RNN memory into a space with a distance metric approximating a bisimulation metric with respect to the hidden parameters. The resulting disentangled feature space can be used to meaningfully relate trajectories to each other and analyze the hidden parameter. We demonstrate our approach on four hidden parameters across three RL environments. Finally we present two methods to help identify and understand the effects of hidden parameters on systems.
translated by 谷歌翻译
Principal Component Analysis (PCA) and its exponential family extensions have three components: observations, latents and parameters of a linear transformation. We consider a generalised setting where the canonical parameters of the exponential family are a nonlinear transformation of the latents. We show explicit relationships between particular neural network architectures and the corresponding statistical models. We find that deep equilibrium models -- a recently introduced class of implicit neural networks -- solve maximum a-posteriori (MAP) estimates for the latents and parameters of the transformation. Our analysis provides a systematic way to relate activation functions, dropout, and layer structure, to statistical assumptions about the observations, thus providing foundational principles for unsupervised DEQs. For hierarchical latents, individual neurons can be interpreted as nodes in a deep graphical model. Our DEQ feature maps are end-to-end differentiable, enabling fine-tuning for downstream tasks.
translated by 谷歌翻译
Visual Inertial Odometry (VIO) is one of the most established state estimation methods for mobile platforms. However, when visual tracking fails, VIO algorithms quickly diverge due to rapid error accumulation during inertial data integration. This error is typically modeled as a combination of additive Gaussian noise and a slowly changing bias which evolves as a random walk. In this work, we propose to train a neural network to learn the true bias evolution. We implement and compare two common sequential deep learning architectures: LSTMs and Transformers. Our approach follows from recent learning-based inertial estimators, but, instead of learning a motion model, we target IMU bias explicitly, which allows us to generalize to locomotion patterns unseen in training. We show that our proposed method improves state estimation in visually challenging situations across a wide range of motions by quadrupedal robots, walking humans, and drones. Our experiments show an average 15% reduction in drift rate, with much larger reductions when there is total vision failure. Importantly, we also demonstrate that models trained with one locomotion pattern (human walking) can be applied to another (quadruped robot trotting) without retraining.
translated by 谷歌翻译
通过将从地面视图摄像头拍摄到从卫星或飞机上拍摄的架空图像的图像,通过将代理定位在搜索区域内,将代理定位在搜索区域内,将代理定位在搜索区域中。尽管地面图像和架空图像之间的观点差异使得跨视图地理定位具有挑战性,但假设地面代理可以使用全景相机,则取得了重大进展。例如,我们先前的工作(WAG)引入了搜索区域离散化,训练损失和粒子过滤器加权的变化,从而实现了城市规模的全景跨视图地理定位。但是,由于其复杂性和成本,全景相机并未在现有机器人平台中广泛使用。非Panoramic跨视图地理定位更适用于机器人技术,但也更具挑战性。本文介绍了受限的FOV广泛地理定位(Rewag),这是一种跨视图地理定位方法,通过创建姿势吸引的嵌入并提供将粒子姿势纳入暹罗网络,将其概括为与标准的非填充地面摄像机一起使用,以供与标准的非卧型地面摄像机一起使用。 Rewag是一种神经网络和粒子滤波器系统,能够在GPS下的环境中全球定位移动代理,仅具有探测仪和90度FOV摄像机,其本地化精度与使用全景相机实现并提高本地化精度相似的定位精度与基线视觉变压器(VIT)方法相比,100倍。一个视频亮点,该视频亮点在https://youtu.be/u_obqrt8qce上展示了几十公里的测试路径上的收敛。
translated by 谷歌翻译
学习电子健康记录(EHRS)表示是一个杰出但未被发现的研究主题。它受益于各种临床决策支持应用,例如药物结果预测或患者相似性搜索。当前的方法集中在特定于任务的标签监督上,对矢量化的顺序EHR,这不适用于大规模无监督的方案。最近,对比度学习在自我监督的代表性学习问题上显示出巨大的成功。但是,复杂的时间性通常会降低表现。我们提出了图形内核信息,这是EHR图形表示的一种自我监督的图内学习方法,以克服先前的问题。与最新的艺术品不同,我们不会更改图形结构以构建增强视图。取而代之的是,我们使用内核子空间扩展将节点嵌入两个几何不同的流形视图中。整个框架是通过通过常用的对比目标在这两种歧管视图上对比的节点和图形表示训练的。从经验上讲,使用公开可用的基准EHR数据集,我们的方法在超过最先进的临床下游任务上产生了表现。从理论上讲,距离指标的变化自然会在不改变图形结构的情况下创建不同的视图作为数据增强。
translated by 谷歌翻译
在诸如增强学习和变分自动编码器(VAE)培训等上下文中,梯度估计通常是将生成模型与离散潜在变量拟合的必要条件。撤销估计器(Yin等,2020; Dong,Mnih和Tucker 2020)在许多情况下实现了Bernoulli潜在变量模型的最新梯度差异。然而,撤消和其他估计器在参数空间的边界附近可能会爆炸方差,而解决方案倾向于存在。为了改善此问题,我们提出了一个新的梯度估计器\ textIt {BitFlip} -1,该{Bitflip} -1在参数空间边界的方差较低。由于BITFLIP-1具有与现有估计器的互补属性,因此我们引入了一个汇总的估计器,\ textIt {无偏梯度方差剪辑}(UGC),该估计量使用BITFLIP-1或每个坐标的摘要梯度更新。从理论上讲,我们证明UGC的差异均高于解除武装。从经验上讲,我们观察到UGC在玩具实验,离散的VAE训练以及最佳子集选择问题中实现了优化目标的最佳价值。
translated by 谷歌翻译